Policy Search using Paired Comparisons

نویسندگان

Malcolm J. A. Strens

Andrew W. Moore

چکیده

Direct policy search is a practical way to solve reinforcement learning (RL) problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng and Jordan, 2000). We evaluate Pegasus, and new paired comparison methods, using the mountain car problem, and a difficult pursuer-evader problem. We conclude that: (i) paired tests can improve performance of optimization procedures; (ii) several methods are available to reduce the ‘overfitting’ effect found with Pegasus; (iii) adapting the number of trials used for each comparison yields faster learning; (iv) pairing also helps stochastic search methods such as differential evolution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Direct Policy Search using Paired Statistical Tests

Direct policy search is a practical way to solve reinforcement learning problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng & Jor...

متن کامل

How to Analyze Paired Comparison Data

Thurstone’s Law of Comparative Judgment provides a method to convert subjective paired comparisons into one-dimensional quality scores. Applications include judging quality of different image reconstructions, or different products, or different web search results, etc. This tutorial covers the popular Thurstone-Mosteller Case V model and the Bradley-Terry logistic variant. We describe three app...

متن کامل

Fitting loglinear Bradley-Terry models (LLBT) for paired comparisons using the R package prefmod

This paper aims at introducing the R package prefmod (Hatzinger, 2009) which allows the user to fit various models to paired comparison data. These models give estimated overall rankings of objects or items where each subject (respondent/judge) makes one or more comparisons between pairs of objects (items). The focus is on the loglinear Bradley-Terry (LLBT) model, the loglinear formulation of t...

متن کامل

Thurstone's Case V model: A structural equations modeling perspective

Modeling how we choose among alternatives, or more generally, modeling preferences, is one of the core topics of study in Psychology. Preferences can be studied experimentally using a variety of procedures, one of the oldest being the method of paired comparisons. This method remains quite popular in areas such as psychophysics and consumer psychology. For a good overview of the method of paire...

متن کامل

End-to-End Training of Deep Visuomotor Policies

Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control. In this paper, we aim to answer the following question: does training the perception and control systems jointly end-toend provide better performance than training...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Journal of Machine Learning Research

دوره 3 شماره

صفحات -

تاریخ انتشار 2002

Policy Search using Paired Comparisons

نویسندگان

چکیده

منابع مشابه

Direct Policy Search using Paired Statistical Tests

How to Analyze Paired Comparison Data

Fitting loglinear Bradley-Terry models (LLBT) for paired comparisons using the R package prefmod

Thurstone's Case V model: A structural equations modeling perspective

End-to-End Training of Deep Visuomotor Policies

عنوان ژورنال:

اشتراک گذاری